Cython perf simplification #861

zerothi · 2024-11-06T19:54:26Z

Closes Pk on complex matrices #104
Added tests for new/changed functions?
Documentation for functionality in docs/
Changes documented in CHANGELOG.md

Here are some details that needs to be sorted out before merging:

Adding tests for complex spin-orbit calculations, this includes post-processing like PDOS etc.
The idea is that instead of 8 numbers, we can now do 4, [uu, dd, ud, du], this has the problem of
overlap being stored as a complex number, but that might be ok.
Document the SOC TB handling a bit better, right now the order of elements are not intuitive.

zerothi · 2024-11-06T19:55:11Z

@pfebrer you won't like this...

Sadly, pure-python mode does not allow cimport statements.

This code-restructuring is a first step to reduce the amount of Cython code.

src/sisl/__init__.py

@@ -88,13 +88,16 @@
 # import the common options used
 from ._common import *

+from ._core import *


codecov · 2024-11-06T20:06:34Z

Codecov Report

Attention: Patch coverage is 85.10638% with 56 lines in your changes missing coverage. Please review.

Project coverage is 86.97%. Comparing base (8559eda) to head (6fedebb).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/sisl/physics/sparse.py	44.18%	48 Missing ⚠️
src/sisl/io/siesta/_help.py	92.85%	6 Missing ⚠️
src/sisl/io/siesta/binaries.py	93.54%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #861      +/-   ##
==========================================
- Coverage   86.98%   86.97%   -0.01%     
==========================================
  Files         404      405       +1     
  Lines       52638    52792     +154     
==========================================
+ Hits        45785    45916     +131     
- Misses       6853     6876      +23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pfebrer · 2024-11-07T18:41:44Z

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

sisl/src/sisl/geom/_neighbors/_operations.py

Lines 8 to 11 in 6bd1395

    
           import cython 
        
           import cython.cimports.numpy as cnp 
        
           import numpy as np 
        
           from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

zerothi · 2024-11-07T20:46:57Z

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

sisl/src/sisl/geom/_neighbors/_operations.py

Lines 8 to 11 in 6bd1395

import cython

import cython.cimports.numpy as cnp

import numpy as np

from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

True, I misunderstood this here...

You're right, but for this to work for sisl intrinsics, you would have to duplicate interfaces in pxd files, and you wouldn't be able to run that code in pure python mode, which kind of defeats the purpose. :S

See here: http://docs.cython.org/en/latest/src/tutorial/pure.html#cimports

pfebrer · 2024-11-07T20:57:28Z

Well, of course it depends on how complex things are, but you could always mock the cimports inside an if cython.compiled: clause so that the file runs in the python interpreter.

But apart from files running out of the box in normal python (which is nice), for me it is a question of readability and the feeling that the code can be ported to something else much more easily if there is a better accelerating framework than cython.

Not saying we should translate everything to pure python, just giving some arguments in favour of the pure python syntax :)

src/sisl/io/siesta/_help.py

@@ -98,45 +99,177 @@
    csr.translate_columns(col_from, col_to)


-def _mat_spin_convert(M, spin=None):
+def _mat2dtype(M, dtype: np.dtype) -> None:


src/sisl/io/siesta/tests/test_tsde.py

This was not really intended. But the amount of different codes that were not using fused tyes was annoying and prohibited further development. I initially tried to implement pure-python mode codes. However, it seems to be impossible to do pure-python mode and cimport it into another pure-python mode code. Before this will be merged, I need to benchmark the changes made. I *hope* it is faster, but I am not sure anything happened. The main change is that the fused datatypes are omnipresent and that the ndarray data type has been removed almost completely. So now, the memoryviews are prevalent. Signed-off-by: Nick Papior <[email protected]>

This means that it is much simpler to track problems. There is no code duplication in the spin-box calculations. I am not sure whether the python-yellow annotations are only for the int/long variables, and when down-casting. In any case, I think this is more easy to manage. Signed-off-by: Nick Papior <[email protected]>

Simple benchmarks showed that siesta Hamiltonians to Hk can be much faster by changing how the folding of the matrices are done. Instead of incrementally adding elements, and searching for duplicates before each addition of elements, we know built the entire array, and use numpy.unique to reduce the actual array. This leverages the numpy unique function which already returns a sorted array. It marginally slows down csr creation of matrices with few edges per orbital (TB models). But will be much faster for larger models stemming from DFT or the likes. Tests for this commit: %timeit H.Hk() %timeit H.Hk([0.1] * 3) %timeit H.Hk(format="array") %timeit H.Hk([0.1] * 3, format="array") For a *many* edge system, we get: 67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) While for a *few* edge system, we get: 9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) For commit v0.15.1-57-g6bbbde39 we get: 196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) and 7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Signed-off-by: Nick Papior <[email protected]>

Enabling fortran is necessary for it to populate details of the fortran world. Signed-off-by: Nick Papior <[email protected]>

Signed-off-by: Nick Papior <[email protected]>

This should enable simpler access to data Signed-off-by: Nick Papior <[email protected]>

Lots of the code were built for floats. This now fixes the issue of reading/writing sparse matrices in specific data-formats. It allows a more natural way of handling SOC matrices, with complex interplay, say: H[0, 0, 2] = Hud as a complex variable, when dealing with floats one needs to do this: H[0, 0, 2] = Hud.real H[0, 0, 3] = Hud.imag which is not super-intuitive. Currently there are still *many* hardcodings of the indices. And we should strive to move these into a common framework to limit the problems it creates. Tests has been added that checks Hamiltonian eigenvalues and Density matrices mulliken charges. So it seems it works as intended, but not everything has been fully tested. Signed-off-by: Nick Papior <[email protected]>

Signed-off-by: Nick Papior <[email protected]>

fixed errors in col creation of dtypes when calling csr.diags() Ensured that sparsegeometry.finalize accepts any arguments the csr.finalize accepts. Removed casting errors in mat2dtype. This may *hide* potential problems when there are non-zero imaginary parts. We should perhaps later revisit the problem considering that TB + Peierls has complex overlap matrices. Fixed issue when a construct was called with Python intrinsic complex values. Signed-off-by: Nick Papior <[email protected]>

Signed-off-by: Nick Papior <[email protected]>

github-advanced-security bot found potential problems Nov 6, 2024

View reviewed changes

zerothi force-pushed the cython-perf-simplification branch 2 times, most recently from 23ab3b7 to 2a5ec18 Compare November 7, 2024 09:30

zerothi mentioned this pull request Nov 8, 2024

Remove dtype in the Spin object #864

Closed

zerothi force-pushed the cython-perf-simplification branch from 5014eac to 295c413 Compare November 8, 2024 12:38

github-advanced-security bot found potential problems Nov 8, 2024

View reviewed changes

zerothi force-pushed the cython-perf-simplification branch from f16ea89 to a9a5f0f Compare November 13, 2024 12:33

zerothi mentioned this pull request Nov 13, 2024

Clarification on Recent Changes to sisl.physics.Hamiltonian #869

Closed

zerothi force-pushed the cython-perf-simplification branch from a9a5f0f to 60f3e22 Compare November 25, 2024 14:16

zerothi added 12 commits November 25, 2024 15:31

fixed print-out of cmake-variables and ensured Fortran is enabled

487bb54

Enabling fortran is necessary for it to populate details of the fortran world. Signed-off-by: Nick Papior <[email protected]>

added the changelog

7293726

Signed-off-by: Nick Papior <[email protected]>

cleaned spin-box extraction of matrix data

53742cf

This should enable simpler access to data Signed-off-by: Nick Papior <[email protected]>

removed unused variable

71c2c47

Signed-off-by: Nick Papior <[email protected]>

fixed wrong object usage in mulliken extraction

b02db45

Signed-off-by: Nick Papior <[email protected]>

added explanation of how transform works

10ada5f

Signed-off-by: Nick Papior <[email protected]>

fixed nc compilation with complex numbers

95370a9

Signed-off-by: Nick Papior <[email protected]>

wrong usage of siesta2sisl

f6df3fb

Signed-off-by: Nick Papior <[email protected]>

zerothi force-pushed the cython-perf-simplification branch from e23989a to f6df3fb Compare November 25, 2024 14:32

zerothi added 2 commits November 26, 2024 11:19

fixed bug in mock netCDF4

6fedebb

Signed-off-by: Nick Papior <[email protected]>

zerothi merged commit c6253c8 into main Nov 26, 2024
16 of 17 checks passed

zerothi deleted the cython-perf-simplification branch November 26, 2024 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cython perf simplification #861

Cython perf simplification #861

zerothi commented Nov 6, 2024 •

edited

Loading

zerothi commented Nov 6, 2024

codecov bot commented Nov 6, 2024 •

edited

Loading

pfebrer commented Nov 7, 2024

zerothi commented Nov 7, 2024

pfebrer commented Nov 7, 2024 •

edited

Loading

Cython perf simplification #861

Cython perf simplification #861

Conversation

zerothi commented Nov 6, 2024 • edited Loading

zerothi commented Nov 6, 2024

codecov bot commented Nov 6, 2024 • edited Loading

Codecov Report

pfebrer commented Nov 7, 2024

zerothi commented Nov 7, 2024

pfebrer commented Nov 7, 2024 • edited Loading

zerothi commented Nov 6, 2024 •

edited

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading

pfebrer commented Nov 7, 2024 •

edited

Loading